Skip to main content

Different ways to create multicategory variables

Coding of multicategory variables can be done in many different ways. The simplest solution is to use the commands generate and replace to code one category at a time. This works fine for a few categories. You then start by coding an output value using generate, and then use replace commands to moderate values ​​based on conditions (one command line for each value). The disadvantage of this is that you risk ending up with many command lines and long scripts that require resources and take a long time to run.

If you want to code many categories, possibly use complicated conditions, then it is recommended to use the recode() command. This can be used to set up all the code conditions in a single command statement, making scripts more compact and faster to run. Through recode() you can, among other things, enter value intervals and create associated value labels (so that you don’t have to do this afterwards through the define-labels and assign-labels commands).

A third set of tools for setting up code expressions for multicategory variables are the inlist() and inrange() functions. These are ideal if you want to create extensive code conditions, usually in combination with generate and replace, e.g. if you want to make a rough grouping of municipalities, where you need to list larger sets of municipality codes.

For those who want to set up code expressions for many categories, there is a fourth option: Automatic generation of recoding by uploading a punctuation-separated recoding file. For more on this, click here.

You will find more information about using generate, replace, recode(), inlist(), inrange() and automatic recoding in the User Guide chapters 3.1 – 3.2.

This script demonstrates the different ways to code multicategory variables:

 require no.ssb.fdb:23 as ds

create-dataset demo

//Create more categories by using generate and replace
import ds/INNTEKT_BRUTTOFORM 2020-01-01 as wealth

generate wealthint = 1
replace wealthint = 2 if wealth > 500000
replace wealthint = 3 if wealth > 1000000
replace wealthint = 4 if wealth > 1500000

tabulate wealthint


//Create more categories (World regions) by using recode
create-dataset population
import ds/BEFOLKNING_STATUSKODE 2021-01-01 as regstat
keep if regstat == '1'

import ds/BEFOLKNING_FODELAND as birthcountry
tabulate birthcountry

destring birthcountry
recode birthcountry (111 120 138 139 140 148 155 156 159/164 = 2 'European non-EU countries') (101/141 144/158 = 1 'EU/EEC') (203/393 = 3 'Africa') (143 404/578 = 4 'Asia incl. Turkey') (612 684 = 5 'North-America') (601/775 = 6 'South- and Pan-Amerika') (802/840 = 7 'Oceania') (980 = 8 'Stateless') (990 = 9 'Unknown')
tabulate birthcountry


//Create code for "big city" based on the municipality codes of the four largest municipalities by using inlist()
import ds/BOSATTEFDT_BOSTED 2021-12-31 as municipality
generate bigcity = 0
replace bigcity = 1 if inlist(municipality,'0301','4601','1103','5001')
tabulate municipality if bigcity, rowsort()


//Group yearly wage income into six groups by using inrange()
import ds/INNTEKT_LONN 2021-12-31 as wage

generate wage_gr = 0
replace wage_gr = 1 if inrange(wage,1,200000)
replace wage_gr = 2 if inrange(wage,200001,400000)
replace wage_gr = 3 if inrange(wage,400001,600000)
replace wage_gr = 4 if inrange(wage,600001,800000)
replace wage_gr = 5 if wage > 800000

define-labels wage_int 0 '0 kr' 1 '1 - 200 000 kr' 2 '200001 - 400 000 kr' 3 '400 001 - 600 000 kr' 4 '600 001 - 800 000 kr' 5 '800 000 kr ->'
assign-labels wage_gr wage_int
tabulate wage_gr

//Alternative way of grouping wage income by using recode
replace wage = 0 if sysmiss(wage)
recode wage (1/200000 = 1)(200001/400000 = 2)(400001/600000 = 3)(600001/800000 = 4)(800001/max = 5)
assign-labels wage wage_int
tabulate wage